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Abstract — This paper introduces the concept of incremental 
traceback for determining changes in the trace of a network as it 
evolves with time. A distributed algorithm, based on the method- 
ology of algebraic traceback developed by Dean et al., is proposed 
which can completely determine a path of d nodes/routers (d G N) 
using 0{d) marked packets, and subsequently determine the 
changes in its topology using O(logd) marked packets with high 
probability. The algorithm is established to be order-wise optimal 
i.e., no other distributed algorithm can determine changes in the 
path topology using lesser order of bits (i.e., marked packets). 
The algorithm is shown to have a computational complexity of 
O(dlogd), which is significantly less than that of any existing 
non-incremental algorithm of algebraic traceback. Extensions of 
this algorithm to settings with node identity spoofing and network 
coding are also presented. 

Index Terms — Incremental traceback, MANETs. 

I. Introduction 

Given the increasing number and forms of attacks on net- 
works in recent years, developing efficient counter-measures, 
such as traceback, is of significant value. In this paper, we 
focus on determining efficient traceback mechanisms for net- 
works with time-varying topologies. Settings such as mobile 
ad-hoc networks (MANETs) are of particular interest in which 
we desire to use traceback towards network management 
and countering attacks such as denial-of-service (DoS) attack. 
DoS attack is arguably one of the most common forms of 
attack on both wire-line and wireless networks, where either 
a single attacker or multiple distributed attackers "flood" a 
victim's link with random packets to disrupt the delivery of 
legitimate packets. For the Internet, IP traceback is one of 
the possible mechanisms for determining the source of this 
attack fl] fT\. Similarly, generalized (not necessarily IP-based) 
traceback proves useful in determining the origin of attacks 
for MANETs. An important point to note is that traceback 
may prove useful for purposes other than countering dis- 
tributed DoS attacks. For instance, it can be used for network 
maintenance purposes |3|, for source/route verification and to 
determine location of faulty nodes in the network. 

Traceback mechanisms have been traditionally studied for 
IP-based networks under the name of IP traceback ||T|. The 
common goal in traceback literature is to perform a post-attack 
traceback for an IP-based network to determine the source(s) 
of the attack. Our paper's focus is on dynamic networks (which 
may or may not be IP-based) where traceback is preemptively 
performed to manage the network and deter possible attacks. 
To this end, we desire that the traceback mechanism be 
efficient and be able to track changes in the traces quickly with 



minimal computation. In this paper, we develop an incremental 
traceback mechanism which, after initialization, requires a low 
packet and computational overhead to detect and determine 
changes in traces of the network. 

A. Background on Traceback 

As mentioned earlier, a large body of literature on traceback 
focuses on IP traceback. However, regardless of the setting, 
good traceback mechanisms share some common properties 
- they should (a) be partially deployable in the network, 
(b) result in little or no change in the router hardware, (c) pro- 
vide accurate traceback using a small number of packets, 

(d) need as minimal an extent of ISP involvement as possible, 

(e) perform well in presence of multiple attack sources and 
forms, (f) have a low complexity mechanism for identifying 
attackers. These properties also serve as the evaluation metrics 
when comparing different traceback approaches. 

The importance of the IP traceback problem has led to a 
large body of research in the field, resulting in the development 
of many interesting traceback mechanisms and methodologies 
to date. We briefly describe some of them: 

(i) Savage et al. f?] proposed one of the earliest proba- 
bilistic traceback mechanisms where routers randomly 
mark packets with their partial path information during 
the process of packet-forwarding. The main disadvan- 
tage of the scheme is the combinatorial computational 
complexity of the traceback process. 

(ii) Song and Perrig |5| proposed an improved and authen- 
ticated packet-marking scheme with the ability to cope 
with multiple attacks. However, the traceback process 
by any workstation needs the knowledge of its current 
upstream router map to all attackers. 

(iii) Bellovin et al. |6| developed iTrace, a traceback scheme 
where routers randomly send their IP addresses in form 
of special packets to the source or destination IP address 
of the data packets. The use of special packets generate 
additional traffic; besides every workstation has to wait 
for long enough time for getting sufficient number of 
special packets to carry out traceback. 

(iv) Dean et al. fT\ suggested a novel algebraic approach to 
the IP traceback problem - encoding the IP addresses 
of routers a packet passes through, into a polynomial. 
This allows reconstruction of the entire path in one go 
after getting sufficient number of packets. 

(v) Adler [8| gave a detailed theoretical analysis of the 
traceback problem, described the tradeoffs of probabilis- 
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tic packet-marking scheme and proposed a 1-bit packet 
marking method to counter DoS attack. 

(vi) Snoeren et al. |9| proposed SPIE, a mechanism which 
tracks every packet through querying of the states of the 
upstream routers. However, this requires the routers to 
store a large amount of state information. 

(vii) Thing and Lee [lO] showed that the performance of a 
traceback process in a wireless ad-hoc network depends 
on the routing protocol and network size. 

In this paper, we perform traceback in a continuous manner, 
with the goal of ensuring that the destination(s) in a network 
stay well informed of the path(s) traversed by the packets 
received by them. We desire that the technique used for 
traceback is such that each node in the network remains 
blind to the global network topology and the changes in it. 
Essentially, when a change in topology occurs, we require 
that the destination(s) alone detect this change and initiate 
an incremental traceback analysis while the remaining nodes 
(including the source(s)) remain oblivious to the change. 

Towards the end of developing an incremental traceback 
mechanism with desired qualities, we use the framework of 
algebraic traceback as developed by Dean et al. Q. Once the 
algebraic traceback process is initialized using the algorithm 
in |7|, we show that 0{\ogd) marked packets and a traceback 
algorithm with a computational complexity of 0{d\ogd) op- 
erations per execution are sufficient to track the change (node 
addition and deletion) in a path involving d nodes {d G N). 
Note that, if the non-incremental algebraic traceback process 
were repeated each time there is a change in the path, 0{d) 
marked packets would be required to perform traceback. Next, 
we argue that our incremental traceback process is order-wise 
optimal in terms of the number of marked packets required 
and has a lower computational complexity compared to the 
conventional non-incremental traceback processes. 

The rest of this paper is organized as follows. Sections [HI and 
III give the system model and a detailed review of the algebraic 
traceback mechanism respectively. The incremental traceback 
schemes based on different path encoding versions of algebraic 
traceback are presented in Sections |IV] and |V] We describe the 
traceback procedure for systems employing network-coding in 
Section IVl] The numerical results are shown in Section IVIII 
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and the paper concludes with Section VIII 



II. System Model 

We consider a network represented by a directed graph. The 
nodes in the graph (identifiable with routers in the network) 
have unique identifiers (IDs) that come from the finite field 
GF{p), for some suitable prime number p. A directed edge 
between a pair of nodes in the graph represents an error-free 
channel. We assume that the transmissions across different 
edges do not interfere with each other in any way. 

Each node can act as a source, a destination or an intermedi- 
ate packet-forwarding node, depending on the communication 
pattern in the network. We focus our attention on one such 
source and destination, represented in the graph by nodes ri 
and D respectively. The source transmit data to the destination 
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Fig. 1 . Dynamic behavior of path V 



via the path V = (ri, r2, . . . , r^, £>). However, this path 
may change over the course of the transmission due to the 
dynamic nature of the network/graph. We want to develop 
an incremental algebraic traceback mechanism that enables 
destination D to figure out this change in path V. 

We assume that there is the possibility of node-ID spoofing, 
i.e., a malicious node in path V misreporting its ID to avoid 
detection by destination D. We also limit our incremental 
traceback approach to track single node addition and deletion 
in path V. This is deliberate, as conventionally, in wireless 
networks, the timescale at which routes/paths change (of the 
order of seconds) is many orders of magnitude greater than the 
timescale of data transmission (of the order of milliseconds or 
less). Thus, any one change can be detected before additional 
changes occur in a path. Our algorithm and analysis framework 
can be naturally extended to scenarios when multiple nodes 
can enter or leave path V. The assumption also makes the 
algorithm description and proofs much more intuitive and 
concise, and therefore we focus on this simple case. 

III. Review: Algebraic Traceback 

In this section, we present certain relevant aspects of al- 
gebraic traceback as developed by Dean et al. |7|. The idea 
behind this traceback scheme is that a polynomial of degree 
n in GF{p) is completely determinable using (n + 1) of its 
evaluations at distinct points in GF{p). Though originally 
designed for IP traceback to counter DoS attack, the approach 
can be generalized to traceback in non-IP based networks. 

A. Deterministic Path Encoding 

The deterministic path encoding scheme is used when no 
node-ID spoofing is suspected. The packet marking process is 
initiated by the first node that encounters the packet (source 
node, which is ri for path V). We include a flag-bit field 
and hop-count field (with initial values 0) in each packet 
in the network - the flag-bit and hop-count values are set 
to 1 when a packet is marked, otherwise the flag-bit value 
remains unchanged and each node following the source node 
just increments the hop-count by 1. In path V, when node 



3 



ri initiates the process of marking a packet (with some 
probability, say qi), it encodes a value-pair {x, y) into it, where 
X is chosen randomly from GF{p) and y — ri. If node 
(i = 2, . . . ,d) encounters a marked packet, it uses the values 
X, y, ri to update the value of y as follows: 



y^y-x- 



(1) 



Hence, any marked packet received by destination D has a 
value-pair of the form (x, y{x)) encoded in it, where 



y{x) 
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If destination D receives d value-pairs {xi,y{xi)), i — 
1,2, ... ,d, where Xi ^ Xj \/i ^ j, path V can be recon- 
structed by solving the following matrix equation: 
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The value of d is obtained from the hop-count field of 
the marked packets. The resulting matrix in the equation 
is a full-rank Vandermonde matrix, and thus the system of 
equations can be solved in 0{d^) operations. Thus, path V is 
determinable using d marked packets, provided the x-values 
encoded in them are distinct. This can be ensured with high 
probability by making the source ri keep a record of the x- 
values it has used while marking packets, thereby avoiding 
re-use of the x-values until the marking of at least p packets. 
Therefore, choosing a large enough p can ensure that 0{d) 
marked packets are sufficient for retrieval of path V. 

B. Randomized Path Encoding 

The deterministic path encoding scheme may be infeasible 
if node-ID spoofing is possible and/or the first node to receive 
a packet is unsure if it is indeed the source node (for example, 
if ri does not know it is the source node in path V). Then 
we require a probabilistic traceback mechanism to address this 
situation. For path V, node ri initiates marking of the packet as 
before (with probability qi), but now each intermediate node 
ii — 2, . . . ,d) clears an existing marking, if any, and re-marks 
a packet with probability g,. Else, with probability (1 — qi), 
each node r; just follows the update mechanism as given by 
([TJ. The following pseudo-code summarizes this procedure: 
Marking scheme at node rf. 
for each packet w 
with probability qi 

X — random; 

y = ri, 

flagbit = 1; 

hopcount = 1; 
otherwise if flagbit — 1 

y ^ y ■ X + r.,; 

hopcount -S— hopcount + 1; 



We assign non-trivial values to the marking probabilities 
qi, i = 1,2, ... ,d such that the traceback process remains ac- 
curate while not requiring a very large overhead. For example, 
|7| examines the case when qi ~ q E (0, 1) Vi. Then, apart 
from marked packets with value-pairs corresponding to path 
V, there are marked packets with value-pairs corresponding 
to sub-paths Vi = (r^+i, ri+2, ...,rd),i = 1, 2, . . . , d - 1, 
as well. A marked packet received by destination D has a 
value-pair of the form {x,y{x)) where 



fc 



y{x) = > rd-ix\ A; = 0, 1, . . . , d - 1. 



4=0 



These marked packets can be segregated, in terms of the sub- 
paths their value-pairs correspond to, on the basis of their hop- 
count value^ as a hop count of i (< d) implies that the value- 
pair is for Vd-i and, consequently, a hop-count of d implies 
that the value-pair is for path V. Using this, the sub-paths and 
therefore, the entire path V can be reconstructed after getting 
sufficient number of marked packets, in a manner similar to 
deterministic path encoding. The a;-values across nodes can 
be maintained as distinct values (to ensure invertibility of the 
resulting matrix at the destination) by requiring that the nodes 
with non-zero marking probabilities keep a track of the x- 
values they use while marking packets and only reuse values 
when all elements in GF{p) have been exhausted. 

Suppose fi, i = 1, 2, . . . , d be defined as the fraction of 
packets marked by node r; and received by destination D, 
then fi can be expressed in terms of qi, i — 1,2, ... ,d, as 

f.^f * n^=^+i(l - <lj) if* 7^ d 
'I Qd ifi = d 

with the fraction of unmarked packets given by /□ = 1 — 
fi) = nf=i(l~'Zi)- This makes the fraction of marked 
packets coming from source ri to be izrjr, i.e., one out of 
[i=r^] marked packets is from node ri on an average. Since d 
marked packets from node ri with distinct x-values are needed 
for determining path V, an average of d\ ^^l" ~\ marked packets 
needs to be received by destination D to ensure that d packets 
among them have value-pairs corresponding to path V. 

If qi — q Vi, we have /o = (1 — qY and /i = q{l — q)'^^^, 
which gives the average number of marked packets as 
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ri-./o" 
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d-1 
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As g 0, the above quantity goes to d^. Hence, if q is chosen 
reasonably small, an average of 0{d^) marked packets are 
sufficient for determining path V. But /q is large for small 
q, which is inefficient as then destination D has to wait for a 
longer time to receive sufficient number of marked packets for 
performing traceback. Thus, there is a tradeoff in the value of 
q. Even for the general case of marking probabilities, d\^^-jJ^~\ 

'For simplicity, we assume that the hop-count field is not attacked. If this 
field is attackable, then alternate mechanisms for path reconstruction exist 
such as the Gurus wami-Sudan algorithm based mechanism presented in 171. 
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becomes smaller as /o and fi become large. But /q cannot be 
very large, causing a tradeoff. Regardless of this tradeoff, an 
average of O (^d{ ^^j" )j marked packets is necessary. 

IV. Inc. Traceback: Deterministic Path Encoding 

In this section, we present an incremental traceback ap- 
proach, based on the methodology of deterministic path encod- 
ing. We adopt the same encoding/marking procedure i.e., the 
source node initiates the packet marking process. As discussed 
earlier, path V can be ascertained using 0{d) marked packets 
with a computational complexity of 0{cP). Our interest is in 
the case when this initial process has occurred, and then path 

V changes due to node addition or deletion. A conventional 
traceback mechanism would repeat the traceback procedure 
again, i.e., destination D would wait until it receives 0{d) 
marked packets again, reconstruct the modified path and 
then determine where the change has occurred. This scheme 
proves to be inefficient - the number of marked packets and 
computational load incurred remains the same. The proposed 
incremental traceback method makes use of the fact that path 

V is known to destination D (due to an initial traceback 
process) to determine the change using 0{logd) marked 
packets with a computational complexity of 0{dlogd). 

The change in topology of path V involves either addition 
or deletion of a single node, which can be detected using the 
hop-count value of a marked packet - it changes from d to 
{d+ 1) for node addition and to {d— 1) for node deletion. We 
examine these two cases separately. 

A. Node Addition 

Note again that the encoding process remains the same 
as before (as in Section |III-A[ ). In incremental traceback, all 
that changes is the decoding algorithm at the destination D. 
Suppose a node with ID s gets added to path V in the mth 
position, 1 < m < d + 1 (1st position refers to the position 
before node ri and (d + l)th position refers to the position 
after node r^). Then the new packets have value-pairs of the 
form {x, z{x)) encoded in them, where 

z{x) = a™(x) + + xbmix)). (2) 

ak{x) and bk{x) are polynomials given by 
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h{x) 



rd + rd-ix + . . . + TkX' 
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\fk^d+l 
iffc = rf + 1 

''-^ if fc 7^ 1 
iffc = 1 



(3) 
(4) 



for k = l,2,...,(i+l. These polynomials are known to 
destination D from the usual traceback performed previously, 
which gives ri, r2, . . . , r^. The polynomials also satisfy 



d-l 

E 

1=0 



rd-iX^ = ak{x) + X 



d-k+l 



bk{x) 



yk, where y{x) refers to the y-value of the marked packet 
received by destination D prior to addition of node s. 



Suppose {xi, Zi), i — 1,2, ... ,1, are the value-pairs encoded 
in I marked packets received after the addition of s in path V. 
We consider the following set of equations: 

z^ = Ukixj) + x'^-''+\s + x^bkixj)), l<j <l. (5) 

From (j2]i, the set of equations is consistent for k = m. For k ^ 
TO, the set of equations is not consistent with high probability 
(this is established by Theorem 1 below). We make use of 
this property to design an incremental traceback algorithm for 
destination D as follows: 
Algorithm I 

(i) Construct a {d + 1) x I matrix 5* ~ [skj] where 
, - akixj) 



Skj 



d-k+l 



Xj bk (,Xj 



(ii) If there exists a unique row in S with equal elements, 
say the mth row, declare that the new node is in TOth 



position with ID s 



1<J<1- 



(iii) If there exists more than one row in S with equal 
elements, declare that an error has occurred. Wait for 
more value-pairs to arrive through marked packets, say 
{xi, Zi), i = I + 1, . . . ,1 + e, where e is an integer of 
smaller order compared to I. Repeat the algorithm using 
the value-pairs {xi, Zi), i = e + 1, . . . ,1 + e. Theorem 
1 below shows that the algorithm terminates with high 
probability while obtaining the correct node ID. 
Theorem 1: A newly added node in path V can be identified 
by destination D using I = 0{logd) marked packets and 
Algorithm I, with a computational complexity of 0{dlogd). 
Proof: From (|5]l, it is clear that all elements of the mth row of 
S will be equal. If this is the only such row, we have the correct 
new node position and ID s = s„ij, 1 < j < I. An error occurs 
if there exists another row i ^ m such that all elements of 
the ith row are equal as well. To determine the probability 
of this happening, we note that Xj is chosen uniformly over 
GF{p). This makes Skj uniform for any k ^ m, since each 
Skj is purely a function of Xj. So, Sij, j ~ 1,2, ... ,1 is an 
i.i.d. uniform random process. This gives 

Pr{h,=kr) = -= 2-'°^-^P 
p 

for any 1 < < I and j ^ j'. Let Ei be the event that 
all elements of the ith row of S are same. Then we have 
Pr{Ei) — 2^"°S2P for i ^ m, since there are I elements in 
each row. The probability of error is 

Pe = Pr{yj,^mE,) < dPr{E,) = 2'°^^ '^-""s^ p 

where the inequality above is due to the union bound. P^. can 
be made arbitrarily small if log2 d — I log^ p can be made as 

then this can 

-loK, d 
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negative as possible. If we require that / > ^ , 



be satisfied. Thus, we choose I = [ 



log2 P 



■ 5] , where 5 G N 



is a small constant. Then P^ gets upper-bounded as 



P < 2'°^2 d-HogaP 



1 



J_2log2d-log2prj5|i|] ^ 

pS - pS 
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where the second inequality follows from the fact that a — 
&rfl < Va, 6 g M, 6 7^ 0. By choosing a large enough 
value for p, Pf. can be bounded above by any arbitrary small 
positive value. In other words, / = 0(log(i) is sufficient for 
determining the newly added node correctly. 

Since the algorithm relies on the computation of S which 
has 1)/ entries, we get a complexity of O(c?logo?) (since 
I = O(logd)). This completes our proof. 

B. Node Deletion 

Suppose node r,„ (1 < m < d) gets deleted from path V, 
leaving behind d — 1 nodes. Then the new marked packets 
carry value-pairs of the form {x,w{x)), where 



w{x) = ayn{x) - X '"'{rm -bm{x)) 



(6) 



ak{x) and are polynomials as defined in ^ and (j4|. 

Suppose {xi, Wi), i — 1,2, ... ,1 he the received value-pairs 
from I marked packets received after deletion of node r,,, . We 
consider the following set of equations: 

Wj = w{xj) = ak{xj) - x1j'^'^{rk ~ bk{xj)), I < j <l. (7) 

From (j6]), the set of equations is consistent for k ~ m. For k ^ 
m, the set of equations is not consistent with high probability 
(proved in Theorem 2). We make use this property to design 
an incremental traceback algorithm for destination D, for the 
case of node deletion, as follows: 
Algorithm II 

(i) Construct a d x I matrix R = [fkj] where 



fkj = bkix) 



Wi 



ak{xj) 



„d-k 



(ii) If there exists a unique row in R with equal elements, 
say the mth row, declare that the deleted node was in 



mth position with ID f 



(iii) If there exists more than one row in R with equal 
elements, declare that an error has occurred. Wait to 
receive more value-pairs through marked packets, say 
{xi, Wi), i = I + 1, . . . ,1 + e, where e is an integer of 
smaller order compared to I. Repeat the algorithm using 
the value-pairs {xi,Zi), i = e + 1, . . . ,1 + e. Theorem 
2 below shows that the algorithm terminates with high 
probability while obtaining the correct node ID. 

Theorem 2: A deleted node in path V can be identified 
by destination D using I = 0{logd) marked packets and 
Algorithm II, with a computational complexity of 0{dlogd). 
Proof: From all elements of the mth row of R will be 
equal. If this is the only such row, we have the correct deleted 
node ID r„i = f„ij, 1 < j < I. An error occurs if there exists 
another row i ^ m such that all elements of the ith row are 
equal as well. Using the same argument as in the proof of 
Theorem 1, we get , j = 1, 2, . . . , / to be an i.i.d. uniform 
random process. This gives 



for 1 < < I and j ^ j' . Let Ei be the event that all ele- 
ments of the ith row of R are same. Then Pr{Ei) = 2^"°S2P 
for i ^ m, and the probability of error is 

Pe = Pr{yj,^mE,) < {d-l)Pr{E,) < 2'°^2d-nog,P 

where the inequality is again due to union bound. Since the 
upper-bound of P^ is same as that for the case of node addi- 
tion, using the same approach as in the proof of Theorem 1, we 
conclude that Pg can be bounded above by any arbitrary small 
positive value and I = O(logd) is sufficient for determining 
the deleted node's location and ID with high probability. Since 
the algorithm makes use of R, which has dl entries, this results 
in a computational complexity of 0{d\ogd) {I = 0{logd)). 
This completes our proof 

Thus, be it node addition or deletion, 0{\ogd) marked 
packets are always sufficient for destination D to determine 
the change in path V accurately. Before we proceed to ran- 
domized traceback algorithms, a quick note on the order-wise 
optimality of Algorithms I and //. Note that, from principles of 
information theory fTP\, it is well known that the entropy of a 
uniform source with an alphabet of size k is log2 k bits. Thus, 
even if a centralized mechanism existed to communicate the 
location of the node being inserted/deleted, it would require 
0(log2(i) bits to do so, as there are d equally likely places 
for the change. Our distributed mechanism uses [ ^ + 61 
packets or approximately 2(log2 + (Jlogjp) bits. Thus, in 
terms of the order of growth of network overhead in d, the 
incremental traceback mechanism is order-wise optimal. 

V. Inc. Traceback: Randomized Path Encoding 

In this section, we present an incremental traceback ap- 
proach, useful when node-ID spoofing is suspected, utilizing 
the randomized path encoding framework. In this setup, each 
packet decides to clear any existing marks and re-initiate the 
marking process with some probability qi. As multiple nodes 
on path V now act as source nodes, we receive different 
(sub) polynomial evaluations across time. The marked packets 
carry value-pairs corresponding to both sub-paths Vi, i = 
1,2, — 1 and of the entire path V. As described in 

Section |III-B| path V can be initially determined using an 



average of O marked packets with a computational 

complexity of at least 0{d^). Once path V is known to the 
destination, we show that it possible to track its changes using 
lesser number of marked packets with lower complexity. 

Due to the random nature of packet-marking, one cannot 
immediately ascertain if node addition or node deletion has 
occurred from the hop-count value of the marked packets. 
So, we need to consider both the possibilities jointly in our 
analysis. If a node with ID s gets added to path V, the value- 
pair of a new marked packet has information about s encoded 
in it, provided it has traversed a sub-path containing node s. 
Similarly, if node r„i is removed from path V, only those 
marked packets that traverse sub-paths that contained node 
r„ prior to its deletion can provide information about r„j. 

Note that the number of marked packets required to detect 
a change (addition or deletion) in path V is highest when the 
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change occurs in the first position of the path i.e., either when 
node ri gets deleted or a new node gets added before it. In 
such a situation, the marked packets that are useful in tracking 
this change are ones that are marked by the first node and by 
no other node along the new path, which we call V' . Let 
f[ denote the fraction of packets received by the destination 
and marked by the ith node in path V' . Then, the fraction 
of marked packets originating at the first node along path 
path V' is YZj7 where /q = 1 — (X]i>i fi) the fraction 
of unmarked packets. This implies that, from an average of 
^[ ^ new marked packets received by the destination after 
a change (addition or deletion in the path), / marked packets 
with the highest hop-counts are likely to come from the node 
in the first position on path V' . In the following sections, 
we show that I ~ 0{\ogd) is sufficient to determine the ID, 
position and nature of the change in the path V, given that the 
destination already has knowledge of the path V. 

Let us start with the assumption that a new node s gets 
added at the mth position in path V {1 < ra < d + 1), Now, a 
marked packet with hop-count h, where d—m+2 < h < d+1, 
contains information that includes the ID s. Therefore, the 
value-pair for this packet can be rewritten as 

z{x) = a^{x) + x''-"'+\s + xbr^^hix)). (8) 

ak{x) is defined as in ^ and bk^h{x) is defined as 

hk^hix) = Tk-i + rk-2X + ... + rd-h+2x''^'^+''^^ 

for k = d~h + 2, . . . and bk.h{x) = for fc = d-h + 2. 

Similarly, if node r™ (1 < m < d) is deleted from path V, 
then a marked packet with hop-count h, where d — m + 1 < 
h < d — 1 contains value-pair {x,w{x)) such that 

w{x) = a„iix) - a;'^"™(r,„ - hnM+2{x)). (9) 

Depending on whether a node gets added or deleted in 
path V, path V' has d+1 or d — 1 nodes respectively. 
Note that, if there is no change in V, we have V' — V. 
So, /q and /{ can take three possible values, one is the 
unchanged /q and /i, the other two values result from a 
change in V (node addition and node deletion). Let Fq and Fi 



l-/o 

/; 



among 



denote those values of /q and /( that maximizes 
these three choices. Suppose (xi, Zi), i = 1,2, ... ,1 are the 
value-pairs of the marked packets with the highest hop-count 
values, say hi, i ^ 1,2, ... ,1, among marked packets 

received by the destination. Then, by an expected/average 
value argument, these I packets are marked by nodes close 
to node ri and possess information about the change in path 
V . If hi = d + 1 for some i, it means there has been node 
addition but if hi < d Vi, we cannot conclude anything and 
have to consider both the possibilities of node addition and 
node deletion. We propose the following incremental traceback 
algorithm for destination D to determine change in path V: 
Algorithm III 

(i) Construct a (d + 1) x Z matrix S — [skj] where 

- akixj) 



for k > d — hj + 2 and s^j = otherwise. 

(ii) If there exists a unique row in S, say the ?fith row, such 
that all non-zero elements (there should be atleast two 
non-zero elements) of the row are equal, declare that 
there is a new node added in mth position with ID s 
equal to the non-zero element value. 

(iii) If there exists more than one row in S with equal non- 
zero elements, declare that an error has occurred. Wait to 
get more value-pairs with high hop-count values through 
marked packets. Repeat (i), (ii) using these and some of 
the earlier value-pairs (/ value-pairs in all). 

(iv) If there exists no row in S with equal non-zero elements, 
construct a d x I matrix R = [fkj] where 



(x) 



ak{xj) 



d-k 



Skj 



-fe+l 



Xjbk,h,{xj 



for k > d — hj + 1 and fkj — otherwise. 

(v) If there exists a unique row in R, say the mth row, such 
that all non-zero elements of the row are equal, declare 
that the node in mth position has been deleted with ID 
equal to the non-zero element value. 

(vi) If there exists more than one row in R with equal non- 
zero elements, declare that an error has occurred. Wait to 
get more value-pairs with high hop-count values through 
marked packets. Repeat (iv), (v) using these and some 
of the earlier value-pairs (I value-pairs in all). 

(vii) If there exists no row in R with equal non-zero elements, 
declare that there has been no change in V. 

Theorem 3: Any change in path V can be identified by 
destination D using I = 0{\ogd) marked packets, containing 
information about the change encoded in them, and Algorithm 
III with a computational complexity of 0{d\ogd). 
Proof: The cases of node addition and node deletion cannot 
return positive results simultaneously i.e., both S and R cannot 
have unique rows with their non-zero elements equal. Since 
the value-pairs from the / marked packets are assumed to 
possess information about the change in V, equality of all the 
elements, not the non-zero elements alone, of some row of R 
or S would confirm the change (from (jsjl and (jojl). So, we need 
to show that, for node addition (node deletion), the existence 
of more than one row in S (R) with equal elements is highly 
improbable for I — 0{\ogd). Note that this is exactly what 
we have already established as part of the proofs of Theorems 
1 and 2. Also, Algorithm III requires evaluating both R and S 
in the worst-case situation, each of which has a computational 
complexity of 0{d\ogd). This gives an overall complexity of 
0{d\ogd). This completes our proof. 

Thus, I = 0{\ogd) marked packets, with the informa- 
tion of path change encoded in them, and an average of 
O {^{\ogd){^—^)^ marked packets in general, are sufficient 
to determine the correct change in topology of V. 

A. Reducing the requirement on number of marked packets 

In this section, we develop two schemes that enable us to 
reduce the average order of marked packets needed to perform 



7 



probabilistic traceback. If qi 

h - q{l - qY~^ and 



q Vi, then /o = (1 - qY, 



1-/0 

h 



l-{l-qY 
q{l - qY-' 



d-l 

E 



(l-qY 



(10) 



Since the quantity in (lOi increases with d, we have 
1 



l-Fo 



E 



which approaches {d + 1) as q 



if q is chosen arbitrarily small, an average of 0{d\ogd) 
marked packets are sufficient for determining any change in 
v. However, a small q implies a larger value for /o, and thus 
there is a tradeoff between the two parameters. 

To reduce the average number of marked packets, we must 
attempt to make each of the fi values comparable to one 
another for this. One way this can be done is through requiring 
that the marking probability of a packet be dependent on 
the hop-count, i.e., higher the hop-count value of a packet, 
lesser is the probability that a node marks it. So, we have 
qi = q{h) where h is the hop-count of a packet and 
q ; N — >■ [0, 1) is a non-increasing function in h. This gives 

/i = 9(1)012(1 - «(«)) and /o = nti(l - 9(0) V. 
Next, we present two packet marking schemes with the aim 
of reducing the average number of marked packets needed for 
incremental probabilistic traceback. 

1) Scheme 1: We consider a constant /lo G and the 
following marking-probability function: 



q{h) = 



9e (0,1) 




if 1 < ft < fto 
otherwise 



This gives /i = q{l - qY"-\ /o = (1 - q)'"' and 
l-/o 



ho 



fl 



l-(l-g) 

q{l - q)^»-^ 



ha-l 



^ (l-g)' 



(11) 



for rf > fto. As (7 — > 0, the quantity in ( [TT] l goes to ftg- So, 
the average order of marked packets becomes O (fto log d) = 
0{logd) for d > fto- Next, we substitute q — ]^ and get: 

. . ho 

1-1- 



1-i^O 



fto 



1 - 



(12) 



for 0? > fto. As fto increases, the numerator and denominator 
12 1 approach 1 — - and - respectively. This makes i^^" ~ 



of 



(e-^Tjfto. Also -Fq ~ I about 37% of the packets remain 
unmarked in this scheme. 

2) Scheme 2: We consider the same constant fto and the 
following marking-probability function: 

a'' a e (0, 1), 1 < ft < fto 
otherwise 



q{h) = 



This gives /i = an-=2(l " «')' /o = 11^=1 (1 " "0 and 



l-/o 
fl 



= 1 



nf:2(i 



1 



(13) 



for d > fto. As a — 0, the ratio in (13 1 goes to 1 and the 
average number of marked packets in the system is 0{\ogd) 



for d > fto. Note that there is a tradeoff in the choice of a - 
if it is small, then the fraction of unmarked packets is large. 
For a G (0, ^] and fto > 3, we get 

1-Fo 



1 



(l-a)(l-a3)- 



(14) 



As a varies from very small to ^, the quantity in (14i varies 



from 1 to 2^ and the fraction of unmarked packets changes 
from close to 100% to around 30%. 

Thus, with an intelligent choice of marking probabilities, 
we can reduce the overall network overhead incurred. 

VI. Traceback for Network Coding 

In the previous sections, we have focused only on a single 
path V with source node ri and destination D. However, a 
general graph can have a multicast set-up with a source com- 
municating to more than one destinations. In such a situation, 
adopting schemes such as network coding can help increase 
the set of rates achievable by the sources in the network. We 
use the algebraic traceback framework in this paper to develop 
a non-incremental (and incremental) mechanism of performing 
traceback in network coded systems. 

To better motivate our traceback mechanism, we start with a 
simple unicast communication setup without network coding. 
Here, one source communicates with only one destination 
through a number of paths (Sections I through V have con- 
sidered the case where there is just one path that is being 
traced). Note that, for unicast communication, network coding 
is not required and the Ford-Fulkerson algorithm fTT\ gives 
us routes that achieve capacity. For a network with unit 
capacity links and a mincut of R, Ford-Fulkerson returns 
R distinct paths from source to destination. We labels these 
paths as Ci, i = 1,2, . . . , R and the goal of traceback is to 
determine the identities of the nodes involved along each path 
at the destination. Note that, if the network mincut is R, the 
destination receives at least R packets at every time instant. 
Here, we assume that the destination can determine which path 
Ci a particular packet traversed. For example, if each path were 
along a different OFDM sub-channel (in a MANET), then our 
assumption implies that the destination can identify the sub- 
channel through which each packet is received. Now, both the 
non-incremental and incremental traceback schemes described 
in Sections III IV and |V] can be performed individually on 
each of the Ci's separately, and nodes along all R paths 
between source and destination can be identified. 

Next, consider a multicast setup where in-network coding is 
used. In other words, there are nodes which generate (random) 
linear combinations of packets which they receive, and forward 
these combinations. We desire to develop a marking scheme 
that will enable us to trace the path taken by the source packet 
even after being linearly combined at the intermediate node 
with other packets. To make our strategy concrete, we take the 
well-known 'butterfly' network as an example for our graph 
(Figure |2]i. Note that our traceback procedure is in no way 
limited to this butterfly network and can be generalized to 
other multicast networks employing network coding. 



Butterfly Network 
S 



Virtual Network 
S 





Fig. 2. Tile butterfly network; and its equivalent virtual network 



In Figure 2, S is the source node and Di and D2 are 
the destination nodes. The paths which are used by packets 
originating from S to Di are SCDi, SEABDi and from S to 
D2 are SED2, SCABD2 for communicating with D2. Note 
that the min-cut for this network is 2 bits, a rate of 2 for both 
[S^Di) and (S*, D2) is achievable using network coding. To 
develop our traceback procedure, consider the virtual network 
in Figure |2]-b where nodes A and B get split into two new 
node-pairs (Ai, v42) and (_Bi, i?2). In this virtual network, the 
same rate of 2 is achievable for both {S,Di) and (5, 1^2) 
without network coding. Moreover, Ford-Fulkerson (routing) 
is sufficient to achieve capacity, and a traditional algebraic 
packet marking scheme is sufficient to perform traceback. 
Thus, for the original network in Figure |2]-a, we desire to 
"mimic" the virtual network in Figure [2]-b. Say {xi,yi) and 
(2^2,2/2) are the value-pairs received by A from C and E 
respectively. Then A chooses one of the value-pairs with some 
probability, say {xi,yi), and updates it using its own ID a, 
to get {xi,y[), where y[ ^ yi ■ Xi + a. To ensure that the 
same path is not chosen every time, node A may change the 
probability of selection in every time-slot. When the chosen 
value-pair is received by the other nodes, the same policy as 
traditional marking is followed. In this way a destination can 
determine the paths to all the sources. For example, destination 
Di can determine the paths SCDi, SEABDi and SCABDi. 
Thus, every destination can recreate the network subgraph 
corresponding to packets it observes. 

A. Faulty/Malicious Nodes in Network-Coded Systems 

As described above, a destination in a network-coded sys- 
tem traces a subgraph instead of a path traversed by a packet. 
Here, we describe an approach to identify a malicious/faulty 
node in such a network. We restrict our attention to the case 
in which a single node in the network is faulty or malicious; 
this approach can be extended to the more general case. 

The broad idea is that routing can be performed in such 
a way that the subgraph traversed by packets from a set 



of sources to a given destination evolves over time. More 
precisely, if at time ti, the subgraph Gi traversed by packets 
originating at sources Si and ^2 and ending at a destination D 
is different from the subgraph G2 traversed between sources 
Si, S2 and destination D at time t2, then the intersection of 
Gi and G2 is small. So, if this subgraph evolves so that 
it is different at different time-slots, then for each time-slot 
that decoding fails (due to some node in the subgraph being 
malicious or faulty), the subgraph traversed during that time- 
slot can be isolated and intersected with subgraphs of other 
such time-slots (when decoding failed). This will enable the 
receiver to identify a small set of nodes (in the intersection) 
as candidates for the malfunctioning/malicious node. 

The subgraph creation needs to be done carefully, so that 
every k subgraphs (for some chosen k) have a nonempty 
but not too large intersection. We defer the details of such 
a construction to a future version of the paper. 

VII. Numerical Results 

In this section, we present some numerical results on the 
number of market packets required to successfully perform 
algebraic traceback. We consider a network where the nodes 
have 16-bit long IDs. This means the order p of the prime 
field, where the identities come from, should be greater than 
2^^ — 1. We assume p = 2^^ + 1, which is the smallest prime 
greater than 2^^ — 1. Then for deterministic path encoding, 
for a dynamic path V of length d the number of marked 
packets needed for determining the path initially is d. As 



derived in Section IV the number of marked packets needed 
for determining the change in path V, once its topology is 
known, is given by I — \ ^ + (5], where 5 € N is a 



constant which determines the rate with which the (union) 
upper-bound of the probability of error decays with p. We 
choose (5 = 2, which upper-bounds the probability of error by 
which is approximately 2^^^ for our case. Figure jsj makes 
the comparison between the number of marked packets needed 
for the usual non-incremental traceback and the incremental 
version for deterministic path encoding. As observed, the 
incremental version of traceback proves to be better - the 
number of marked packets is far smaller and the rate of growth 
of marked packets needed, with d increasing, is also smaller 
than non-incremental traceback. 

The average number of marked packets needed for random- 
ized path encoding for both the non-incremental and incre- 
mental traceback versions is also shown in Figure |3] Here, we 
consider the case when the nodes mark packets independently 
of each other with probability q = 0.04 ((/j = qVi). This 
gives /o = (1 - qY' - (0.96)'^ and fi = q{l - qY'-^ = 
0.04(0.96)'^^^. The average number of marked packets needed 
by the conventional traceback is d ( ^^j^ ) and the average 



number of packets needed by the incremental traceback is 
[j^l^J 2] (^TY^)- '^his case, the average number of 
marked packets needed for incremental traceback increases 
significantly compared to the deterministic path encoding case, 
but it is still less than the number needed by conventional 
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Fig. 3. Comparison of the number of marked packets needed for determining 
■p for both deterministic and randomized full path encoding versions. 



Fig. 4. Comparison of the quantity (1 
respect to d for various marking schemes. 



Fo)/Pi ™d its variation with 



randomized path encoding version of traceback. 

We next analyze the performances of Schemes 1 and 2 



(Section V-A i in reducing the average order of marked packets 
needed and compare it with the scheme in [7| i.e., where all 
nodes mark packets with same probability (let us call this 
Scheme 0). For both the Schemes 1 and 2, we assume Hq ~ 5 
i.e., once a node sees a marked packet of hop-count 5 or more, 
it does not mark it. We consider q — 0.2 for Scheme and 
I, a = 0.5 for Scheme 2. Then for d > ho — 5, the fraction 
of unmarked packets are 32% and 30% for Schemes 1 and 
2 respectively, which seems reasonable. Figure [4] depicts the 
variation of ^-j^ with d. Clearly for Schemes 1 and 2, the 
value becomes a constant while for Scheme 0, it continues to 
grow in value. Thus, Schemes 1 and 2 reduce the average order 
of number of marked packets needed to perform traceback. 

VIII. Conclusion and Remarks 

In this paper, we present a mechanism of performing incre- 
mental algebraic traceback in networks with a topology that 
is changing much slower than its rate of communication. We 
initialize the system using an established algebraic traceback 
mechanism, and then track the network as it evolves using 
an efficient incremental traceback mechanism. The decoding 
process is altered from a traditional traceback scheme. This 
decoding mechanism actively searches for a change in network 
topology in the incoming packets, and when one is detected, 
it determines what the change is (insertion or deletion), where 
it has occurred in the network and what the new ID, if any, 
of the inserted node is. We also show that, for the case with 
no ID spoofing among nodes, the resulting algorithm requires 
0{\ogd) marked packets and a complexity of 0{dlogd) be- 
fore it can declare success in determining the ID of the change 
in a path of d nodes. We also show, very straightforwardly, 
that this packet overhead is order-wise optimal. 



Note that our proof mechanisms closely resemble random 
coding proofs in information theory for discrete additive mem- 
oryless channels. Algorithms I through /// can be viewed as 
"achievability" proofs from conventional information theory, 
while, in this case, the converse is straightforward. A final 
remark is that, when we swap a more stringent probability 
1 (zero error) requirement for tracking the changing path in 
a dynamic network with a arbitrarily small error constraint, 
the resulting time taken and complexity of the incremental 
traceback algorithm decreases substantially. 
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